-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add robust median to gopher filter #98
Add robust median to gopher filter #98
Conversation
Co-authored-by: @TTTTao725
…KennethEnevoldsen/dolma into robust-median-for-gopher-filter
Co-authored-by: TTTTao725 <[email protected]>
…KennethEnevoldsen/dolma into robust-median-for-gopher-filter
Co-authored-by: TTTTao725 <[email protected]>
…KennethEnevoldsen/dolma into robust-median-for-gopher-filter
…KennethEnevoldsen/dolma into robust-median-for-gopher-filter
Updated in accordance with failing tests |
0, | ||
self.character_count, | ||
type="median_word_length", | ||
score=self.median_word_length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given than median_word_length
is bool | float
, wouldn't this make score
potentially a bool
? score is supposed to be a float
, so we would have to cast back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it starts out as a False, so tried to match the existing pattern. However the median can be undefined (empty list) but multiple value could represent that (np.nan, 0, False). I would probably go for np.nan or None if that is valid?
tnx!! added a comment, looks good otherwise. |
for the python style error, you should be able to run |
Hi @soldni fixed the style errors! |
@soldni it seems like this is awaiting an approval for the tests to be run |
Hey @KennethEnevoldsen! I've made a few more changes:
I've approved this PR, lmk if it looks good to you too before I merge! |
Thanks for taking care of that. Everything looks good to me |
This pull request adds a robust median function to the gopher filter, which would otherwise fails on empty docs such as "" or " ".
Sorry for the multiple commit wanted to add @TTTTao725 as a co-author as he found the bug.